Method and circuit for data encryption/decryption

ABSTRACT

Data are converted between an unencrypted and an encrypted format according to the Rijndael algorithm, including a plurality of rounds. Each round is comprised of fixed set of transformations applied to a two-dimensional array, designated state, of rows and columns of bit words. At least a part of said transformations are applied on a transposed version of the state, wherein rows and columns are transposed for the columns and rows, respectively.

FIELD OF THE INVENTION

[0001] The invention relates to encryption/decryption techniques andmore specifically refers to Advanced Encryption Standard (AES)cryptosystems based e.g. on the so-called Rijndael algorithm.

[0002] The Rijndael algorithm is a block cipher algorithm operating onblocks of data. The algorithm reads an entire block of data, processesthe block and then outputs the encrypted data. The Rijndael algorithmneeds a key, which is another block of data. The proposed AES standardwill include only 128-bit as standard length for plaintext blocks and128, 192 and 256-bit as standard lengths for the key material.

DESCRIPTION OF THE PRIOR ART

[0003] For a general review of the Rijndael/AES algorithms reference maybe made to the following documents/websites:

[0004] J. Daemen, V. Rijmen, “AES Proposal: Rijndael” www.nist.gov/aes;

[0005] J. Daemen, V. Rijmen, “The Block Cipher Rijndael” Smart CardResearch and Applications, LNCS 1820, J. -J. Quisquater and B. Schneier,Eds., Springer-Verlag, 2000, pp. 288-296;

[0006] J. Daemen and V. Rijmen, “Rijndael, the advanced encryptionstandard”, Dr. Dobb's Journal, Vol.˜26, No. 3, March 2001, pp. 137-139;

[0007] V. Rijmen, “Efficient Implementation of the Rijndael S-box”http://www.eas.kuleuven.ac.be/˜rijmen/rijndael/;

[0008] J. Gladman “A specification for Rijndael, the AES Algorithm”March 2001 http://fp.gladman.plus.com/;

[0009] M. Akkar, C. Giraud “An implementation of DES and AES, secureagainst some attacks”—Proceedings of CHES 2001, pp. 315-325;

[0010] M. McLoone, J. V. McCanny “High performance single-chip FPGARijndael algorithm implementations”—Proceedings of CHES 2001, pp. 68-80;

[0011] V. Fischer, M. Drutarovsky “Two methods of Rijndaelimplementation in reconfigurable Hardware” Proceedings of CHES 2001, pp.81-96;

[0012] H. Kuo and I. Verbauwhede “Architectural optimization for a 3Gbits/sec VLSI Implementation of the AES Rijndael algorithm”,Proceedings of CHES 2001, pp. 53-67;

[0013] Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao, and P.Rohatgi “Efficient Rijndael encryption implementation with compositefield arithmetic” Proceedings of CHES 2001, pp.175-188;

[0014] A. Dandalis, V. K. Prasanna, J. P. D. Rolim “An adaptivecryptographic engine for IPSec architecutures” Field-Programmable CustomComputing Machines, 2000 IEEE Symposium on 2000, pp. 132-141;

[0015] “Advanced Encryption Standard (AES)” www.nist.gov/aes;

[0016] National Institute of Standard and Technology www.nist.gov/aes

[0017] Rijndael Home Page's www.esat.kuleuven.ac.be/rijmen/rijndael/

[0018] Gladman Home Page http://fp.gladman.plus.com/ The encryptionprocess based on the Rijndael algorithm follows the general layout shownin FIG. 1 of the enclosed drawings.

[0019] Unencrypted data are subject to a sequence of “rounds” R1, R2, .. . , R9, R10. Each round in turn provides for the application of arespective round key (i.e. round key 1, round key 2, . . . ) generatedaccording to a key scheduling process KS.

[0020] Each generic round Ri develops along the lines shown in FIG. 2and is essentially based on a first processing step currently referredto as the S-box step or function. This generates a matrix array which issubjected to a row shifting process followed by column mixing.

[0021] The respective key scheduled for round Ri is then added toproduce the output of the round. The output of the final round(designated round 10 in FIG. 1) corresponds to the encrypted data.

[0022] More specifically, the first and last rounds are at leastmarginally different from the other rounds: the first round is in factcomprised of key addition only, while the last round does not providefor mix column transformation.

[0023] The decryption algorithm of AES is very similar to the encryptionprocess just described. The decryption process is essentially based on asequence of steps reproducing in a complementary manner the sequence ofsteps of the encryption process, wherein each transformation is replacedby the respective inverse transformation.

[0024] All of the foregoing corresponds to basic principles and criteriawell known to those of skill in the art (see, for instance, thereferences cited in the introductory portion of this description), thusmaking it unnecessary to provide a more detailed description herein.This applies more to the point to the steps/functions designated “S-box”and “Add Key” in FIG. 2.

[0025]FIG. 3 is a schematic representation of a round in matrix form.

[0026] Apart from the add round key, sub byte and shift row operations,the application of a single round can essentially be described as theapplication to an array of input data ID of a matrix M to generate acorresponding array of output data OD. Data ID and OD are in typical32-bit format partitioned in four 8-bit words (bytes).

[0027] In current implementations of the Rijndael/AES algorithm, matrixM is thus a matrix including 4×4=16 elements s₀, . . . , s₁₅ iscorresponding to a byte.

[0028] The block diagram of FIG. 4 shows a typical embodiment of anencryption system implementing the Rijndael/AES algorithm according tothe traditional approach followed so far.

[0029] The system shown in FIG. 4, designated 10 overall, is intended togenerate encrypted data starting from unencrypted data UD. Bothunencrypted and encrypted data UD and ED are arranged in a 32-bit wordformat.

[0030] In the diagram of FIG. 4, reference numeral 12 designates a demuxunit which distributes the input unencrypted data stream UD over fourdifferent paths leading to respective adder modules 14 a, 14 b, 14 c and14 d where the first key addition is performed.

[0031] Reference numerals 24 a, 24 b, 24 c and 24 d designatesrespective sets of byte register wherein the 32-bit words subjected tothe first key addition are distributed over four byte registers to besubsequently fed to respective sets of modules 34 a, 34 b, 34 c and 34 dwhere the S-box processing takes place.

[0032] Reference 16 designates a module which implements the shift rowoperation. Data blocks resulting from row shifting are fed to respectivemix column modules 18 a, 18 b, 18 c and 18 d.

[0033] These latter modules are intended to be bypassed during the lastround. In fact the structure shown permits the first round to becalculated immediately. Iterative calculation is then carried out forthe following rounds. As indicated, the last round does not provide forthe mix column step, whereby lines are shown enabling such a step to bebypassed during the last round.

[0034] The data output from modules 18 a, 18 b, 18 c and 18 d—which arearranged over four parallel 8-bit words—are then fed to respective keyaddition modules 20 a, 20 b, 20 c and 20 d where the key additionoperation is performed. After being subjected to key addition in modules20 a, 20 b, 20 c and 20 d data are loaded into final registers 22 a to22 d from which the encrypted code words are fed to a multiplexer unit26 to generate the encrypted data stream ED.

[0035] All of the foregoing again corresponds to principles and criteriawhich are known to those of skilled in the art.

[0036] The main disadvantage of the prior art solutions exemplified bythe arrangement shown in FIG. 4 lies in the complex circuitry requiredto implement the encryption/decryption mechanism. Such a disadvantage isparticularly felt to those envisaged applications of cryptosystemsadapted for use in embedded systems such as e.g. smartcards and thelike.

[0037] One main object of the present invention is thus to provide animproved form of implementing the Rijndael/AES algorithm making itpossible to expand the field of use of such algorithm in cryptosystems.

[0038] According to the present invention, this object, as well asadditional objects which will become apparent from the followingdetailed description of a preferred embodiment of the invention, areachieved by means of a method and system having the features set forthin the annexed claims.

[0039] The arrangement of the invention can in fact be regarded asembodying a novel encryption method, which however can be renderedcompatible with existing standards through initial and finaltransposition steps.

DETAILED DESCRIPTION OF THE DRAWINGS

[0040] The invention will now be described, by way of non limitingexample, by referring to the enclosed drawings, wherein:

[0041] FIGS. 1 to 4, exemplary of prior art approaches for implementingthe Rijndael/AES algorithm have been already described in the foregoing,

[0042]FIG. 5 is intended to highlight, by direct comparison to FIG. 3,the basic underlying mechanism of the present invention, and

[0043]FIG. 6 shows how the system shown in the block diagram of FIG. 4is modified and simplified by resorting to the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

[0044] In order to better understand the basic underlying principle ofthe invention, it must be recalled that Rijndael is a secret keycryptographic algorithm working in block cipher mode. This means that itoperates on blocks of data and not on single bits or bytes. Thealgorithm reads an entire block, processes it and then output theencrypted block. The encryption operates in a complementary way tore-obtain plaintext starting from encrypted data.

[0045] To operate properly, the Rijndael algorithm needs a key, which isanother block of data.

[0046] The initial specification for this algorithm includes 128-bit,192-bit and 256-bit as possible lengths for the plaintext blocks and forthe key material. The prospected AES standard will expectedly includeonly 128-bit as standard length for plaintext blocks and 128, 192 and256-bit as standard length for the key material.

[0047] The following description will therefore deal—by way of exampleonly—with 128-bit blocks, as this adheres to the presently prospectedstandard.

[0048] The input, output and cipher key bit sequences are processed asarrays of bytes formed by dividing these sequences into groups of 8contiguous bits (bytes). Internally, the operations of the AES algorithmare performed on a two dimensional array of bytes called the state.

[0049] Specifically, by referring again to FIG. 3, matrix ID representsthe input bytes, matrix M represents the state bytes, and OD designatesthe output bytes. The state consists of four rows of bytes, each rowcontaining 4 bytes, thus making the state a 4×4 matrix.

[0050] The four bytes in each column of the state array M form 32-bitwords, hence the state can also be interpreted as a one-dimensionalarray of 32-bit words (columns), where the column number provides anindex into this array.

[0051] As shown in connection with FIG. 2, the Rijndael cipher algorithmoperates in rounds. Each round is a fixed set of transformations thatare applied to the state.

[0052] The number of these rounds is chosen as a function of the keylength. In the case of the three examples referred to in the foregoing,three possible key sizes of 128-bit, 196 and 256 bits can be considered.Depending on these sizes, 10 rounds (as shown in FIG. 1), 12 rounds or14 rounds are to be computed, respectively.

[0053] The present invention is based on the unexpected recognition thatusing for the internal state array a transposed arrangement (that is,using—in the place of matrix M—matrix M′ where the rows have beenexchanged for the columns and vice-versa) leads to a surprising speed-upand simplification of the encryption/decryption process.

[0054] According to the prior art, an operation is applied to thecolumns, for instance column S₀ S₁ S₂ S₃ of matrix M1.

[0055] When the state is transposed, the column becomes S₀ S₄ S₈ S₁₂.

[0056] This concept may be better understood by referring to the examplewhich follows of a transformation carried out on a non-transposed state.$\begin{matrix}\left\lbrack S_{0,c} \right. \\S_{1,c} \\S_{2,c} \\\left. S_{3,c} \right\rbrack\end{matrix} = {\begin{matrix}\left\lbrack 02 \right. & 03 & 01 & 01 \\01 & 02 & 03 & 01 \\01 & 01 & 02 & 03 \\03 & 01 & 01 & \left. 02 \right\rbrack\end{matrix}\quad \begin{matrix}\left\lbrack S_{0,c} \right. \\S_{1,c} \\S_{2,c} \\\left. S_{3,c} \right\rbrack\end{matrix}}$

[0057] where c is the column index which can be equal to 0, 1, 2, and 3.

[0058] If a new, transposed form is used, the main transformation forthe new mix column becomes

y ₀=({02}·x ₀)+({03}·x ₁)+x ₂ +x ₃

y ₁ =x ₀+({02}·x ₁)+({03}·x ₂)+x ₃

y ₂ =x ₀ +x ₁+({02}·x ₂)+({03}·x ₃)

y ₃=({03}·x ₀)+x ₁ +x ₂+({02}·x ₃)

[0059] Transposed Form x_(i)=└S_(0,i) S_(1,i) S_(2,i) S_(3,i)┘

[0060] where x₁, 0≦i≦3 are the words of the transposed state, and y_(i),0≦i≦3 are the words of the transposed state after mix columntransformation.

[0061] In the foregoing, operator · means a multiplication in a Galoisfield applied to each of the four 8-bit terms comprising the 32-bitwords being processed (i.e. {02}·x₀ means {02}·S_(o,o) {02}·S_(1,0){02}·S_(2,0) {02}·S_(3,0)) while the operator + is a sum in GaloisFields, a logic XOR between two 32-bit words.

[0062] Such a transposition requires a redefinition of must of theoperations performed in a round of the algorithm, and also of the keyschedule. Therefore, also the round keys must be transposed before beingapplied to a round providing for the use of a transposed state.

[0063] A trivial solution for that purpose is simply to apply theoriginal key schedule unchanged and then add code to transpose everycreated round key. In that way, a large overhead would be introduced.

[0064] For that reason, the preferred embodiment of the inventionprovides for the key schedule being applied directly in the transposedmanner.

[0065] This means that the internal behaviour of the system is modified,and simplified, the only requirement to obtain compatibility with thestandard being that the state must be re-transposed before being output.

[0066] The block diagram of FIG. 6 shows how the prior art arrangementshown in FIG. 4 is simplified and rendered faster by resorting to theinvention.

[0067] In FIG. 6 parts and components which are identical or equivalentto those already described in connection with FIG. 4 have been indicatedwith the same reference numerals.

[0068] Essentially, the solution of the invention has a basic impact onthe shift row block 16 and the mix column blocks 18 a, 18 b, 18 c and 18d of FIG. 4.

[0069] In the solution of the invention, four shift column modules 16 a,16 b, 16 c and 16 d—each acting on a respective flow from one of theS-box modules 34 a, 34 b, 34 c and 34 d—are substituted for shift rowmodule 16.

[0070] By referring to the two tables reproduced in the foregoing, itwill become apparent that in the solution of the invention generation ofeach of the components y₀ y₁ y₂ y₃ essentially derives from a linearcombination of words x₀ x₁ x₂ x₃ This makes it possible to implement therespective transformation simply by means of adder modules (and shiftregisters).

[0071] In the block diagram of FIG. 6 a single mix column module 18 isprovided jointly operating on all of the sixteen 8-bit words output fromshift column modules 16 a, 16 b, 16 c, 16 d is substituted for mixcolumn modules 18 a, 18 b, 18 c and 18 d of the prior art arrangement ofFIG. 4.

[0072] Experimentation carried out by the applicants demonstrates thatthe invention significantly increases the speed of implementing theRijndael algorithm, even if the overhead due to the initial and finaltranspositions of the state array is taken into account.

[0073] Direct comparison of the solution of the invention with theso-called Gladman's implementation (reportedly the fastest softimplementation of the Rijndael algorithm currently available) shows thatthe invention leads to improvements in terms of encryption anddecryption speeds of 46% and 33%, respectively, for a 128-bit key size.

[0074] Improvements demonstrated in encryption and decryption speedswith a 192-bit key size are 39% and 25%, respectively.

[0075] Finally, improvements in encryption and decryption speed of 45%and 32%, respectively were demonstrated for a 256-bit key size.

[0076] It will be appreciated that advantages in terms of latency areprimarily felt at the level of software implementation, while the mainadvantage at the hardware level lies (even with identical performance interms of latency) in the smaller amount of functional units required.This leads to simpler and less expensive systems, which is aparticularly relevant factor in the case of decryption systems.

[0077] The solution of transposing the state matrix can be applied toall cases contemplated by the Rijndael algorithm, advantages beingsignificant especially for 128 and 256 bit words. As indicated, if noinitial and final transpositions to ensure compatibility with theexisting standards are effected, a thoroughly novel cryptographicsystems is obtained.

[0078] The present invention has been described with reference to thepreferred embodiments. However, the present invention is not limited tothose embodiments. Various changes and modifications may be made withinthe spirit and scope of the amended claims.

1. A method of converting data between an unencrypted format and anencrypted format, wherein said data are organised in bit words, themethod including a plurality of rounds, each round being comprised offixed set of transformations applied to a two-dimensional array,designated the state, of rows and columns of bit words, the methodincluding the step of applying at least a part of said fixed set oftransformations to a transposed version of said state, wherein said rowsand columns are transposed for the columns and the rows, respectively.2. The method of claim 1, wherein said bit words are 8-bit words orbytes.
 3. The method of claim 1, wherein said state is a 4×4 matrix ofbit words.
 4. The method of claim 1, including 10, 12 or 14 rounds. 5.The method of claim 1, wherein said rounds involve the use of respectiveround keys, and wherein said round keys are subjected to transpositionbefore being used in a respective round applied on a transposed state.6. The method of claim 5, wherein said round keys are applied accordingto a round key schedule, the method including the step of applying saidround key schedule as in the case of rounds providing for said set oftransformations being applied to a non-transposed state and the step ofadding code to transpose every round key thus created.
 7. The method ofclaim 1, including the step of applying said round keys according atransposed key schedule.
 8. The method of claims 1, including the stepof re-transposing said transposed state before outputting it.
 9. Acircuit for converting data between an unencrypted data format and anencrypted format, the circuit including registers for storing said datain the form of bit words as well as circuitry for implementing aplurality of rounds, each round being comprised of a fixed set oftransformations applied to a two-dimensional array, designated thestate, of rows and columns of bit words, wherein said circuitry isarranged to apply at least part of said fixed set of transformations toa transposed version of said state, wherein rows and columns aretransposed for columns and rows, respectively.
 10. The circuit of claim9, wherein said registers are adapted for storing said bit words as8-bit words or bytes.
 11. The circuit of claim 9, wherein said circuitryis configured to operate on said state in the form of a 4×4 matrix ofbit words.
 12. The circuit of claim 9, wherein said circuitry isconfigures to implement 10, 12 or 14 rounds.
 13. The circuit of claim 9,wherein said circuitry includes respective sets of S-box processingmodules, each said set of S-box modules operating on a group of bitwords corresponding to a cell of a column of said state.
 14. The circuitof claim 13, wherein each said column is composed of four said cells.15. The circuit of claim 9, including a plurality of respective sets ofshift column modules each set being adapted to perform a column shiftoperation on a column of said state.
 16. The circuit of claim 15,including a single mix column module to perform column mix operations onthe shift column data generated from the shift column modules of saidplurality.
 17. The circuit of claim 9, wherein the circuit is an encoderfor converting data from an unencrypted data format into an encrypteddata format.
 18. The circuit of claim 17, wherein the circuit is anembedded system such as an integrated circuit in a smart card.
 19. Thecircuit of claim 9, wherein the circuit is a decoder for converting datafrom an encrypted data format into an unencrypted data format.
 20. Thecircuit of claim 19, wherein the circuit is an embedded system such asan integrated circuit in a smart card.